Tiling Nested Loops into Maximal Rectangular Blocks

نویسندگان

  • Yeong-Sheng Chen
  • Sheng-De Wang
  • Chien-Min Wang
چکیده

techniques are described as follows. Wolfe discusses the techniques of strip mining and iteration space tiling [25], which organize the computations in the original loops into chunks of equal size to take advantage of vector registers, caches, or local memory. Nicolau [13] proposes a method called loop quantization to partition nested loops. King and Li [9] discuss the grouping of loop iterations for parallel execution on multicomputers. The cycle shrinking method proposed by Polychronopoulos [16] uses the data dependence graphs of loops to determine which loops can be executed in parallel. Irigoin and Triolet [8] present a method called supernode partitioning, which formulates a general condition for determining the admissible tiles formed by multiple hyperplanes. Schreiber and Dongarra [20] discuss a heuristic method of choosing a subset of dependence vectors for tiling loops. Ramanujam and Sadayappan [18] formulate an approach to determine the hyperplanes and the tile sizes for reducing the communication cost in distributed memory systems. As for the code generation issue for loop tiling, Ancourt and Irigoin [1] discuss how to generate codes for the tiles defined by a set of linear equations, and the code generation techniques for nonunimodular loop transformations [12, 17] can also provide a solution to this problem. In this paper, we propose a new approach to tiling nested loops for exploiting parallelism. The proposed approach aims at aggregating as many independent computations as possible into a tile in order to maximize parallelism. At first, we describe a systematic procedure to find all the computations that are independent when the loops are to be executed. All these independent computations are collected together as a union of sets, which are called initially independent computation sets. Then, we show that all the initially independent computations can be aggregated into rectangular blocks. So, based on these, the original loops can be partitioned into rectangular blocks to maximize parallelism. Since each partitioned region is block-shaped, the code generation is very easy. Also, we show that if the wavefront transformation is combined with the proposed method, the original loops can always be tiled so that the tile size is greater than one. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 35, 123–132 (1996) ARTICLE NO. 0075

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PrimeTile: A Parametric Multi-Level Tiler for Imperfect Loop Nests

Tiling is a crucial loop transformation for generating high performance code on modern architectures. Efficient generation of multi-level tiled code is essential for maximizing data reuse in systems with deep memory hierarchies. Tiled loops with parametric tile sizes (not compile-time constants) facilitate runtime feedback and dynamic optimizations used in iterative compilation and automatic tu...

متن کامل

On Parameterized Tiled Loop Generation and Its Parallelization

Tiling is a loop transformation that decomposes computations into a set of smaller computation blocks. The transformation has proved to be useful for many high-level program optimizations, such as data locality optimization and exploiting coarse-grained parallelism, and crucial for architecture with limited resources, such as embedded systems, GPUs, and the Cell. Data locality and parallelism w...

متن کامل

An Efficient Code Generation Technique for Tiled Iteration Spaces

This paper presents a novel approach for the problem of generating tiled code for nested for-loops, transformed by a tiling transformation. Tiling or supernode transformation has been widely used to improve locality in multi-level memory hierarchies, as well as to efficiently execute loops onto parallel architectures. However, automatic code generation for tiled loops can be a very complex comp...

متن کامل

Tiling and Scheduling of Three-level Perfectly Nested Loops with Dependencies on Heterogeneous Systems

Nested loops are one of the most time-consuming parts and the largest sources of parallelism in many scientific applications. In this paper, we address the problem of 3-dimensional tiling and scheduling of three-level perfectly nested loops with dependencies on heterogeneous systems. To exploit the parallelism, we tile and schedule nested loops with dependencies by awareness of computational po...

متن کامل

Code Generation Methods for Tiling Transformations

Tiling or supernode transformation has been widely used to improve locality in multi-level memory hierarchies, as well as to efficiently execute loops onto parallel architectures. However, automatic code generation for tiled loops can be a very complex compiler work due to non-rectangular tile shapes and arbitrary iteration space bounds. In this paper, we first survey code generation methods fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Parallel Distrib. Comput.

دوره 35  شماره 

صفحات  -

تاریخ انتشار 1996